Ethan Fosse
October 11, 2016
Research Associate, Department of Sociology
States.RDataggplot2workshop.htmltm package) to data visualzation (ggplot2 package) to data wrangling (dplyr package)Packages tab and click Installinstall.packages("package_name")
package_name is just the name of the R package in quotesinstall.packages("ggplot2")library(ggplot2)Later we'll examine if education might help us understand variation in voting across the United States
We'll use data to answer this question!
States.RData into our workspaceUsing RStudio's user-friendly interface:
C:/Folder/)You can also try this R Code:
load(C:/Folder/States.RData)
View()View(States)
qplot() and ggplot()qplot(), which stands for “quick plot”qplot() is very powerful and has a similar syntax to the base R functionsqplot() is a functionqplot() function has the following syntax:qplot(x, y, data, color, shape, size, facets, geom, stat)
x and y: the variables to plotdata: your data setcolor,shape, and size: aesthetic argumentsfacets: optional splitting (or “faceting”) into subplotsgeom: the actual visualization of the data (such as point or line)stat: any statistical summaries to be applied qplot(data=States, x=Population, geom="histogram", bins=10)
bins = 10 to bins=20 or bins=5?qplot(data=States, x=Population, geom="density")
I() function to specify a particular color I() function tells ggplot2 that this is a color, not an R object in its own right qplot(data=States, x=Population, geom="histogram", color=I("red"), bins=20)
qplot(data=States, x=Population, geom="density", color=I("red"))
qplot(data=States, x=Population, geom="histogram", fill=I("red"), bins=20)
qplot(data=States, x=Population, geom="density", fill=I("red"))
qplot(data=States, x=Population, geom="histogram", color=ObamaMcCain, bins=20)
qplot(data=States, x=Population, geom="density", color=ObamaMcCain)
qplot(data=States, x=Population, geom="histogram", fill=ObamaMcCain, bins=20)
qplot(data=States, x=Population, geom="density", fill=ObamaMcCain)
qplot(data=States, x=Population, geom="histogram", fill=ObamaMcCain, bins=20, alpha=I(0.5))
qplot(data=States, x=Population, geom="density", fill=ObamaMcCain, alpha=I(0.5))
alpha to 0.2? How about 0.8?alpha to 1? How about 0? qplot(data=States, x=HighSchool, geom="density", fill=Region, alpha=I(0.5))
recode command from the car packagecar R package and then load itinstall.packages("car")
library("car")
States$Rich <- car::recode(States$HouseholdIncome, "lo:55000='Poor'; 55001:hi = 'Rich'", as.factor.result=TRUE)
table(States$Rich)
qplot(data=States, x=Rich, geom="bar")
qplot(data=States, x=Rich, color=Rich, geom="bar")
qplot(data=States, x=Rich, fill=Rich, geom="bar")
qplot(data=States, x=Rich, color=Region, geom="bar")
qplot(data=States, x=Rich, fill=Region, geom="bar")
row_variable ~ column_variablerow_variable ~ . or . ~ column_variable. indicates that we're not conditioning on the rows or columns, respectivelyqplot(data = States, x = Region, fill = Region, geom = "bar", facets = Rich ~ .)
qplot(data = States, x = Region, fill = Region, geom = "bar", facets = . ~ Rich)
qplot(data = States, x =Region, fill = Region, geom = "bar", facets = ObamaMcCain ~ Rich)
qplot(data=States, x=Rich, fill=Region, geom="bar", xlab="Income Category", ylab="Number of States")
We can add axis labels to any qplot, not just bar plots
Try this R Code:
qplot(data=States, x=Population, geom="histogram", bins=10, xlab="Population (in Millions)", ylab="Number of States")
qplot(data=States, x=Rich, fill=Region, geom="bar", main="Bar Plot of Income Category by Region")
We can add axis labels to any qplot, not just bar plots
Try this R Code:
qplot(data=States, x=Population, geom="histogram", bins=10, main="Histogram of Population")
Create a bar plot with ggplot2 to answer this question
R Code Hint:
qplot(data = States, x = Rich, fill = Rich, geom = "bar", facets = . ~ ObamaMcCain)
They're great for examining the distribution of a continuous variable by levels of a categorical variable
Try this R Code:
qplot(data=States, x=ObamaMcCain, y=HouseholdIncome, geom="boxplot")
qplot(data=States, x=ObamaMcCain, y=GSP, geom="boxplot")
qplot(data=States, x=ObamaMcCain, y=GSP, fill=ObamaMcCain, geom="boxplot")
We can also overlay the box plots wiht points using the c() function
Try this R Code:
qplot(data=States, x=ObamaMcCain, y=HouseholdIncome, geom=c("boxplot","point"))
qplot(data=States, x=ObamaMcCain, y=HouseholdIncome, geom=c("boxplot","jitter"))
point versus jitter?label option and by specifying text as a geom object to be plottedsize option controls the size of the text labelsqplot(data=States, x=ObamaMcCain, y=HouseholdIncome, geom="boxplot")
qplot(data=States, x=ObamaMcCain, label=State, y=GSP, geom=c("boxplot","text"))
Often skewed distributions are logged
Try this R Code:
qplot(data=States, x=ObamaMcCain, y=Population, geom="boxplot")
qplot(data=States, x=ObamaMcCain, y=Population, log="y", geom="boxplot")
Between those states that went for McCain versus Obama, is there a difference in the percentage that went to college?
R Code Hint:
qplot(data=States, x=ObamaMcCain, y=College, geom=c("boxplot","point"))
qplot(data=States, x=ObamaMcCain, y=College, label=State, geom=c("boxplot","text"))
ObamaMcCain?HousedholdIncome, College, and NonWhiteOften skewed distributions are logged
Try this R Code:
qplot(data=States, x=College, y=HouseholdIncome, geom="point")
cor(States$College, States$HouseholdIncome)
qplot(data=States, x=College, y=NonWhite, geom="point")
cor(States$College, States$NonWhite)
We can add a smoothed line over the graph, which includes standard errors to represent sampling uncertainty
Try this R Code:
qplot(data=States, x=College, y=HouseholdIncome, geom=c("point","smooth"))
qplot(data=States, x=College, y=NonWhite, geom=c("point","smooth"))
We can also replace the points with text labels
Try this R Code:
qplot(data=States, x=College, y=HouseholdIncome, label=State, geom=c("smooth", "text"))
qplot(data=States, x=College, y=NonWhite, label=State, geom=c("smooth", "text"))
HousedholdIncome, College, and NonWhiteWhich of these variables do you think is most predictive of the percentage voting for Obama?
R Code Hint:
qplot(data=States, x=HouseholdIncome, y=ObamaVote, label=State, geom=c("text","smooth"))
qplot(data=States, x=College, y=ObamaVote, label=State, geom=c("text","smooth"))
qplot(data=States, x=NonWhite, y=ObamaVote, label=State, geom=c("text","smooth"))
The ggplot2 function map_data() on latitude and longitude of U.S. states
Try this R Code:
USAMap <- map_data("state")
str(USAMap)
View(USAMap)
Our goal is to merge these two data sets, which requires a common variable
Try this R Code:
States$region <- tolower(States$State)
StatesMerged <- merge(x=USAMap, y=States, by = "region")
str(StatesMerged)
StatesMerged?StatesMerged also has information on latitude, longitude, and a variable called orderorder() functionStatesMerged <- StatesMerged[order(StatesMerged$order), ]
Now we're ready to create a map!
Try this R Code:
qplot(data = StatesMerged, x=long, y=lat, group = group, fill = HouseholdIncome, geom = "polygon")
We'll use scale_fill_gradient() to alter the color gradient
Try this R Code:
qplot(data = StatesMerged, x=long, y=lat, group = group, fill = College, geom = "polygon") + scale_fill_gradient(low="red", high="green")
If you've followed along, then you should already have the data set ready!
R Code Hint:
qplot(data = StatesMerged, x=long, y=lat, group = group, fill = ObamaVote, geom = "polygon") + scale_fill_gradient(low="red", high="blue")
URL: https://compass-workshops.github.io/info/
Email List: Send an email to listserv@lists.princeton.edu with “Subscribe COMPASSWORKSHOPS” in the body and all other lines blank, including the subject
| Date | Topic |
|---|---|
| September 20 | Introduction to R and RStudio |
| September 27 | Data Wrangling in R |
| October 4 | Base R Graphics |
| October 11 | Data Visualization in R with ggplot2 |
| October 18 | Programming Loops in R |
| November 8 | Probability and Simulations in R |
| November 15 | Monte Carlo Simulations in R |
| November 29 | Text Analysis in R |
| December 6 | Hypothesis Testing in R |
| December 13 | Regression Analysis in R |
Connect with Us:
Teaching Staff
Faculty Sponsors